33 research outputs found

    Modeling the Temporal Nature of Human Behavior for Demographics Prediction

    Full text link
    Mobile phone metadata is increasingly used for humanitarian purposes in developing countries as traditional data is scarce. Basic demographic information is however often absent from mobile phone datasets, limiting the operational impact of the datasets. For these reasons, there has been a growing interest in predicting demographic information from mobile phone metadata. Previous work focused on creating increasingly advanced features to be modeled with standard machine learning algorithms. We here instead model the raw mobile phone metadata directly using deep learning, exploiting the temporal nature of the patterns in the data. From high-level assumptions we design a data representation and convolutional network architecture for modeling patterns within a week. We then examine three strategies for aggregating patterns across weeks and show that our method reaches state-of-the-art accuracy on both age and gender prediction using only the temporal modality in mobile metadata. We finally validate our method on low activity users and evaluate the modeling assumptions.Comment: Accepted at ECML 2017. A previous version of this paper was titled 'Using Deep Learning to Predict Demographics from Mobile Phone Metadata' and was accepted at the ICLR 2016 worksho

    Pool inference attacks on local differential privacy: quantifying the privacy guarantees of apple's count mean sketch in practice

    Get PDF
    Behavioral data generated by users’ devices, ranging from emoji use to pages visited, are collected at scale to improve apps and services. These data, however, contain fine-grained records and can reveal sensitive information about individual users. Local differential privacy has been used by companies as a solution to collect data from users while preserving privacy. We here first introduce pool inference attacks, where an adversary has access to a user’s obfuscated data, defines pools of objects, and exploits the user’s polarized behavior in multiple data collections to infer the user’s preferred pool. Second, we instantiate this attack against Count Mean Sketch, a local differential privacy mechanism proposed by Apple and deployed in iOS and Mac OS devices, using a Bayesian model. Using Apple’s parameters for the privacy loss Δ, we then consider two specific attacks: one in the emojis setting — where an adversary aims at inferring a user’s preferred skin tone for emojis — and one against visited websites — where an adversary wants to learn the political orientation of a user from the news websites they visit. In both cases, we show the attack to be much more effective than a random guess when the adversary collects enough data. We find that users with high polarization and relevant interest are significantly more vulnerable, and we show that our attack is well-calibrated, allowing the adversary to target such vulnerable users. We finally validate our results for the emojis setting using user data from Twitter. Taken together, our results show that pool inference attacks are a concern for data protected by local differential privacy mechanisms with a large Δ, emphasizing the need for additional technical safeguards and the need for more research on how to apply local differential privacy for multiple collections

    When and where do you want to hide? Recommendation of location privacy preferences with local differential privacy

    Full text link
    In recent years, it has become easy to obtain location information quite precisely. However, the acquisition of such information has risks such as individual identification and leakage of sensitive information, so it is necessary to protect the privacy of location information. For this purpose, people should know their location privacy preferences, that is, whether or not he/she can release location information at each place and time. However, it is not easy for each user to make such decisions and it is troublesome to set the privacy preference at each time. Therefore, we propose a method to recommend location privacy preferences for decision making. Comparing to existing method, our method can improve the accuracy of recommendation by using matrix factorization and preserve privacy strictly by local differential privacy, whereas the existing method does not achieve formal privacy guarantee. In addition, we found the best granularity of a location privacy preference, that is, how to express the information in location privacy protection. To evaluate and verify the utility of our method, we have integrated two existing datasets to create a rich information in term of user number. From the results of the evaluation using this dataset, we confirmed that our method can predict location privacy preferences accurately and that it provides a suitable method to define the location privacy preference

    The rise of consumer health wearables: promises and barriers

    Get PDF
    Will consumer wearable technology ever be adopted or accepted by the medical community? Patients and practitioners regularly use digital technology (e.g., thermometers and glucose monitors) to identify and discuss symptoms. In addition, a third of general practitioners in the United Kingdom report that patients arrive with suggestions for treatment based on online search results. However, consumer health wearables are predicted to become the next “Dr Google.” One in six (15%) consumers in the United States currently uses wearable technology, including smartwatches or fitness bands. While 19 million fitness devices are likely to be sold this year, that number is predicted to grow to 110 million in 2018. As the line between consumer health wearables and medical devices begins to blur, it is now possible for a single wearable device to monitor a range of medical risk factors. Potentially, these devices could give patients direct access to personal analytics that can contribute to their health, facilitate preventive care, and aid in the management of ongoing illness. However, how this new wearable technology might best serve medicine remains unclea

    Understanding the interplay between social and spatial behaviour

    Get PDF
    According to personality psychology, personality traits determine many aspects of human behaviour. However, validating this insight in large groups has been challenging so far, due to the scarcity of multi-channel data. Here, we focus on the relationship between mobility and social behaviour by analysing trajectories and mobile phone interactions of ∌1000 individuals from two high-resolution longitudinal datasets. We identify a connection between the way in which individuals explore new resources and exploit known assets in the social and spatial spheres. We show that different individuals balance the exploration-exploitation trade-off in different ways and we explain part of the variability in the data by the big five personality traits. We point out that, in both realms, extraversion correlates with the attitude towards exploration and routine diversity, while neuroticism and openness account for the tendency to evolve routine over long time-scales. We find no evidence for the existence of classes of individuals across the spatio-social domains. Our results bridge the fields of human geography, sociology and personality psychology and can help improve current models of mobility and tie formation

    A survey of results on mobile phone datasets analysis

    Get PDF

    Detrimental network effects in privacy: a graph-theoretic model for node-based intrusions

    No full text
    Despite proportionality being one of the tenets of modern data protection laws such as the EU General Data Protection Regulation and Law Enforcement Directive, we currently lack a robust analytical framework to evaluate the reach of modern data collections and the network effects at play. We here propose a graph-theoretic model and notions of node- and edge-observability to quantify, in the form of attacks, the reach of networked data collections. We first prove closed-form expressions for our metrics and quantify the impact of the graph’s structure on observability. Second, using our model, we quantify how (1) from 270,000 compromised accounts, Cambridge Analytica collected 68.0M Facebook profiles; (2) from surveilling 0.01% the nodes in a mobile phone network, a law-enforcement agency could observe 18.6% of all communications; and (3) an app installed on 1% of smartphones could monitor the location of half of the London population through close proximity tracing. We hope this work to help better quantify the reach and therefore proportionality of data collection mechanisms moving forward
    corecore